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In this paper, we present a model-driven methodology and toolset for automatic 
generation of hypertext system repositories. Our code generator, called Bamboo, is based 
on a Containment Modeling Framework (CMF) that uniformly describes data models for 
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We offer an overview of current Web search engine design. After introducing a generic 
search engine architecture, we examine each engine component in turn. We cover 
crawling, local Web page storage, indexing, and the use of link analysis for boosting 
search performance. The most common design and implementation techniques for each of 
these components are presented. For this presentation we draw from the literature and 
from our own experimental search engine testbed. Emphasis is on introduci ... 
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This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an 
application written on the platform to perform automated semantic tagging of large 
corpora. We apply SemTag to a collection of approximately 264 million web pages, and 
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terms 

In this paper, we propose a new approach to discover informative contents from a set of 
tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first 
partitions a page into several content blocks according to HTML tag <TABLE> in a Web 
page. Based on the occurrence of the features (terms) i n the set of pages, it calculates 
entropy value of each feature. According to the entropy value of each feature in a content 
block, the entropy value of the block is defined. By analyz ... 

Keywords: entropy, information extraction, information retrieval, informative content 
discovery 
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Many Web information services utilize techniques of information extraction(IE) to collect 
important facts from the Web. To create more advanced services, one possible method is 
to discover thematic information from the collected facts through text classification. 
However, most conventional text classification techniques rely on manual-labelled corpora 
and are thus ill-suited to cooperate with Web information services with open domains. In 
this work, we present a system named LiveClassifier that ... 

Keywords: text classification, topic hierarchy, web mining 
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Metadata is ordinarily used to describe documents, but it can also constitute a form of 
infrastructure for access to networked resources and for traversal of those resources. One 
problematic area for access to digital library resources has been the search for time 
periods or events. If there is a capability to search for time, it is usually a date search - a 
standardized and precise form but unfortunately rarely used in common chronological 
expressions. For example, a user interested in the "Vie ... 

Keywords: metadata infrastructure, time period directories 
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This paper presents a similarity retrieval engine - SIREN-that allows posing similarity 
queries in a relational DBMS using an extended syntax that adds the support for such 
type of queries in the SQL language. It discusses the main architecture of SIREN, 
describes some key features and provides a description of the demo. 
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Often scientists need to locate appropriate software for their problems and then select 
from among many alternatives. We have previously proposed an approach for dealing 
with this task by processing performance data of the targeted software. This approach has 
been tested using a customized implementation referred to as PYTHIA. This experience 
made us realize the complexity of the algorithmic discovery of knowledge from 
performance data and of the management of these data together with the d ... 

Keywords: data mining, inductive logic programming, knowledge discovery in databases, 
knowledge-based systems, performance evaluation, recommender systems, scientific 
software 
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This report proposes a Reference Model (RM) for database management system (DBMS) 
standardization. A Reference Model is a conceptual framework whose purpose is to divide 
standardization work into manageable pieces and to show at a general level how these 
pieces are related with each other. The proposed RM comprises a Data Mapping Control 
System (DMCS) that retrieves and stores application data, application schemas, and data 
dictionary schemas. This DMCS is bounded by two interfaces: the Data Lan ... 
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Different from traditional information retrieval, both content and structure are critical to 
the success of Web information retrieval. In recent years, many relevance propagation 
techniques have been proposed to propagate content information between web pages 
through web structure to improve the performance of web search. In this paper, we first 
propose a generic relevance propagation framework, and then provide a comparison 
study on the effectiveness and efficiency of various representative pro ... 

Keywords: hyperlink based score propagation, hyperlink based term propagation, 
relevance propagation, sitemap based score propagation, sitemap based term propagation 
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Charles W. Bachman reviews his career. Born during 1924 in Kansas, Bachman attended 
high school in East Lansing, Michigan before joining the Army Anti Aircraft Artillery Corp, 
with which he spent two years in the Southwest Pacific Theater, during World War II. 
After his discharge from the military, Bachman earned a B.Sc. in Mechanical Engineering 
in 1948, followed immediately by an M.Sc. in the same discipline, from the University of 
Pennsylvania. On graduation, he went to work for Do ... 
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We consider the problem of estimating the size of a collection of documents using only a 
standard query interface. Our main idea is to construct an unbiased and low-variance 
estimator that can closely approximate the size of any set of documents defined by 
certain conditions, including that each document in the set must match at least one query 
from a uniformly sampleable query pool of known size, fixed in advance. Using this basic 
estimator, we propose two approaches to estimating corpus size. T ... 

Keywords: corpus size, estimator, random sampling 
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